Nous souhaitons réalisé l’étude d’une série temporelle et de faire des prévisions sur celle-ci.
Cette série temporelle est le trafic mensuel d’une Compagnie aérienne de janvier 2011 à août 2019.
Nos prévisions portent sur les 8 mois de l’année 2019
Import de la base, on select que la colonne des valeurs
library(readr)
data <- read_delim("Trafic-voyageurs.csv",
delim = ";", locale = locale(encoding = "ISO-8859-1"))
Rows: 104 Columns: 2
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ";"
chr (1): dates
dbl (1): trafic
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(data)
dates trafic
Length:104 Min. :220876
Class :character 1st Qu.:297154
Mode :character Median :355178
Mean :354651
3rd Qu.:407331
Max. :505190
data_value <- data[,2]
Création de la série chronologique :
data_ts <- ts(data_value, start=2011, frequency=12)
plot_1_TimeSeries(data_ts)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Attachement du package : ‘plotly’
L'objet suivant est masqué depuis ‘package:ggplot2’:
last_plot
L'objet suivant est masqué depuis ‘package:stats’:
filter
L'objet suivant est masqué depuis ‘package:graphics’:
layout
#revoir l affichage car ca prend pas en compte tt 2019
data_ts_train <- window(data_ts, start = c(2011, 1), end = c(2018,12))
data_ts_test <- window(data_ts, start= c(2019,1), end = c(2019,8))
data_ts_train
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 245900 238000 263227 277991 286691 303085 256800 222012 279870 262482 232680 250491
2012 253207 243067 285105 281615 295914 318227 269288 220876 292226 282423 259679 269736
2013 297836 298520 321846 315499 320499 353498 297567 248162 318171 327165 290611 277998
2014 329673 323638 376079 357470 356857 381665 329881 285621 346323 349668 312503 339102
2015 344498 344439 386410 362362 343875 401484 352111 309220 377684 382437 338143 356961
2016 361309 367323 412595 405576 384740 402523 378002 322954 403606 414451 363939 395146
2017 399926 394249 442815 427597 421828 458617 405451 349186 433709 436849 400391 403614
2018 415292 423665 478207 443548 464162 457944 440436 366272 457318 460735 413900 426097
data_ts_test
Jan Feb Mar Apr May Jun Jul Aug
2019 443700 441499 480649 463680 453372 505190 445332 370211
plot(data_ts, xlim=c(2011,2020))
lines(data_ts_test, col=3)
legend("topleft", lty = 1, col=c(1,3), legend=c("Série chronologique Train", "Série chronologique Test"))
-> strong trend -> patern qui se repete, saisonnalité ?
Analyse de la saisonnalité en superposant chaque année (par mois):
-> en supprimant la tendance on voit bien la saisonnalité => saisonnalité régulière
ggseasonplot(data_ts)
data_ts_without_trend = diff(data_ts)
ggseasonplot(data_ts_without_trend)
DECOMPOSITION : additive / Multiplicative Ts = Trend + Seasonal + Random / Ts = Trend * Seasonal * Random
decomposed_data <- decompose(data_ts_train, type="additive")
plot(decomposed_data$trend)
plot(decomposed_data$seasonal)
plot(decomposed_data$random)
boxplot(data_ts ~ cycle(data_ts))
-> on distingue des saisonnalités => faire régression ca n’a pas de sens => modèle de Buys Ballot
-> bonne repartition du bruit -> quelques outliers
checkresiduals(remainder(decomposed_data))
Warning in modeldf.default(object) :
Could not find appropriate degrees of freedom for this model.
On a tendances + saisonnalité
https://mpra.ub.uni-muenchen.de/77718/1/MPRA_paper_77718.pdf page 175
L’approche de BUYS-BALLOT consiste à introduire des variables indicatrices correspondant à chaque saison définit par le cycle d’observation. Pour les données trimestrielles, on intègre 4 variables indicatrices. Et pour les données mensuelles, on intègre 12 variables indicatrices.
Le modèle doit alors être estimé (sans constante) avec ces variables indicatrices.
library(forecast)
mean <- meanf(data_ts_train, h=8)
naivem <- naive(data_ts_train, h=8)
driftm <- rwf(data_ts_train, h=8, drif=T)
snaivem <- snaive(data_ts_train, h=8)
plot(mean, plot.conf = F, main="")
Warning in plot.window(xlim, ylim, log, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in title(main = main, xlab = xlab, ylab = ylab, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in axis(1, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in axis(2, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in box(...) : "plot.conf" n'est pas un paramètre graphique
lines(naivem$mean, col=2, lty=1)
lines(driftm$mean, col=5, lty=1)
lines(snaivem$mean, col = 4, lty=1)
legend("topleft", lty=1, col=c(1,2,3,4), legend=c("Mean Method", "Naive Method", "Drif Method", "Seasonal Naive"))
#comparaison :
plot(snaivem, plot.conf = F, main="")
Warning in plot.window(xlim, ylim, log, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in title(main = main, xlab = xlab, ylab = ylab, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in axis(1, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in axis(2, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in box(...) : "plot.conf" n'est pas un paramètre graphique
lines(data_ts_test, col = 6, lty=1, lwd=3)
plot(driftm, plot.conf = F, main="")
Warning in plot.window(xlim, ylim, log, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in title(main = main, xlab = xlab, ylab = ylab, ...) :
"plot.conf" n'est pas un paramètre graphique
Warning in axis(1, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in axis(2, ...) : "plot.conf" n'est pas un paramètre graphique
Warning in box(...) : "plot.conf" n'est pas un paramètre graphique
lines(data_ts_test, col = 6, lty=1, lwd=3)
On regarde : MAE : Mean Absolute Error : RMSE : Root Mean Squarred Error : MASE : Mean Absolute Scaled Error : MAPE : Mean Absolute Percentage Error :
res = pred - val MAE = sum(abs(res))/length(val) RSS = sum(res^2) MSE = RSS/length(val) RMSE = sqrt(MSE)
La plus populaire est la MAPE
MAPE(y_pred, y_true)
MAPE = (1/n) * Σ(|actual – forecast| / |actual|) * 100 “a MAPE value of 6% means that the average difference between the forecasted value and the actual value is 6%”
print(summary(mean))
Forecast method: Mean
Model Information:
$mu
[1] 346667.1
$mu.se
[1] 6731.642
$sd
[1] 65956.35
$bootstrap
[1] FALSE
$call
meanf(y = data_ts_train, h = 8)
attr(,"class")
[1] "meanf"
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1.941958e-11 65611.93 55535.08 -3.855657 17.01186 2.16177 0.8254447
Forecasts:
checkresiduals(mean)
Ljung-Box test
data: Residuals from Mean
Q* = 731.64, df = 18, p-value < 2.2e-16
Model df: 1. Total lags used: 19
accuracy(mean, data_ts_test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 1.941958e-11 65611.93 55535.08 -3.855657 17.01186 2.161770 0.8254447 NA
Test set 1.037870e+05 110031.92 103787.04 22.486144 22.48614 4.040036 0.0485288 2.517689
print(summary(naivem))
Forecast method: Naive method
Model Information:
Call: naive(y = data_ts_train, h = 8)
Residual sd: 36679.9508
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1896.811 36679.95 29013.27 -0.02007386 8.597313 1.129377 -0.2744236
Forecasts:
checkresiduals(naivem)
Ljung-Box test
data: Residuals from Naive method
Q* = 248.52, df = 19, p-value < 2.2e-16
Model df: 0. Total lags used: 19
accuracy(naivem, data_ts_test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 1896.811 36679.95 29013.27 -0.02007386 8.597313 1.129377 -0.2744236 NA
Test set 24357.125 43915.19 38328.62 4.72582164 8.499751 1.491988 0.0485288 1.063155
print(summary(driftm))
Forecast method: Random walk with drift
Model Information:
Call: rwf(y = data_ts_train, h = 8, drift = T)
Drift: 1896.8105 (se 3778.1861)
Residual sd: 36825.2032
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 2.297696e-11 36630.87 28899.04 -0.5861884 8.591266 1.12493 -0.2744236
Forecasts:
checkresiduals(driftm)
Ljung-Box test
data: Residuals from Random walk with drift
Q* = 248.52, df = 18, p-value < 2.2e-16
Model df: 1. Total lags used: 19
accuracy(driftm, data_ts_test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 2.297696e-11 36630.87 28899.04 -0.5861884 8.591266 1.124930 -0.27442358 NA
Test set 1.582148e+04 41314.60 33586.60 2.7843152 7.582963 1.307399 0.06907259 1.007801
print(summary(snaivem))
Forecast method: Seasonal naive method
Model Information:
Call: snaive(y = data_ts_train, h = 8)
Residual sd: 28666.7301
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 25337.46 28666.73 25689.63 7.101745 7.207375 1 0.2695124
Forecasts:
checkresiduals(snaivem)
Ljung-Box test
data: Residuals from Seasonal naive method
Q* = 35.426, df = 19, p-value = 0.0124
Model df: 0. Total lags used: 19
accuracy(snaivem, data_ts_test)
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set 25337.46 28666.73 25689.63 7.101745 7.207375 1.0000000 0.2695124 NA
Test set 14263.38 22148.43 16960.88 3.053421 3.648407 0.6602226 -0.5427745 0.4792835
fcst_se <- ses(data_ts_train, h = 8)
print(summary(fcst_se))
Forecast method: Simple exponential smoothing
Model Information:
Simple exponential smoothing
Call:
ses(y = data_ts_train, h = 8)
Smoothing parameters:
alpha = 0.2559
Initial states:
l = 258126.0245
sigma: 31480.96
AIC AICc BIC
2430.727 2430.988 2438.420
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 7057.127 31151.31 25752.89 1.326234 7.684321 1.002462 0.0711143
Forecasts:
checkresiduals(fcst_se)
Ljung-Box test
data: Residuals from Simple exponential smoothing
Q* = 144.66, df = 17, p-value < 2.2e-16
Model df: 2. Total lags used: 19
plot(fcst_se)
lines(data_ts_test, col="red")
df_se = as.data.frame(fcst_se)
predict_value_se <- df_se$`Point Forecast`
MAPE(predict_value_se, data_ts_test)*100
[1] 7.658334
Fit Exponential Smoothing model -> trouve le meilleur lissage expo
fit_ets <- ets(data_ts_train)
print(summary(fit_ets))
ETS(A,A,A)
Call:
ets(y = data_ts_train)
Smoothing parameters:
alpha = 0.1568
beta = 1e-04
gamma = 1e-04
Initial states:
l = 248267.1099
b = 2163.3982
s = -17928.3 -29535.73 9295.935 11005.81 -57117.85 -7708.17
38272.64 14592.34 16899.53 34763.15 -7344.204 -5195.15
sigma: 11014.45
AIC AICc BIC
2241.611 2249.458 2285.205
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -458.6799 10054.77 7831.554 -0.2623253 2.371375 0.3048527 0.09626331
checkresiduals(fit_ets)
Ljung-Box test
data: Residuals from ETS(A,A,A)
Q* = 7.1794, df = 3, p-value = 0.06639
Model df: 16. Total lags used: 19
fcst_ets <- forecast(fit_ets, h=8)
plot(fcst_ets)
lines(data_ts_test, col="red")
df_ets = as.data.frame(fcst_ets)
predict_value_ets = df_ets$`Point Forecast`
MAPE(predict_value_ets, data_ts_test)*100
[1] 3.005848
# retourne les meilleurs paramètres
# d=1 enleve la tendance
# D=1 enleve la saisonnalité
# => avoir des données stationnaires
# trace : voir les résultats
fit_arima <- auto.arima(data_ts_train, d=1, D=1, stepwise = FALSE, approximation = FALSE, trace=TRUE)
ARIMA(0,1,0)(0,1,0)[12] : 1846.398
ARIMA(0,1,0)(0,1,1)[12] : 1833.134
ARIMA(0,1,0)(0,1,2)[12] : 1835.211
ARIMA(0,1,0)(1,1,0)[12] : 1833.056
ARIMA(0,1,0)(1,1,1)[12] : 1835.09
ARIMA(0,1,0)(1,1,2)[12] : Inf
ARIMA(0,1,0)(2,1,0)[12] : 1835.207
ARIMA(0,1,0)(2,1,1)[12] : 1837.012
ARIMA(0,1,0)(2,1,2)[12] : 1836.461
ARIMA(0,1,1)(0,1,0)[12] : 1814.951
ARIMA(0,1,1)(0,1,1)[12] : 1801.155
ARIMA(0,1,1)(0,1,2)[12] : 1803.362
ARIMA(0,1,1)(1,1,0)[12] : 1803.592
ARIMA(0,1,1)(1,1,1)[12] : 1803.361
ARIMA(0,1,1)(1,1,2)[12] : Inf
ARIMA(0,1,1)(2,1,0)[12] : 1805.004
ARIMA(0,1,1)(2,1,1)[12] : 1805.397
ARIMA(0,1,1)(2,1,2)[12] : Inf
ARIMA(0,1,2)(0,1,0)[12] : 1816.915
ARIMA(0,1,2)(0,1,1)[12] : 1803.033
ARIMA(0,1,2)(0,1,2)[12] : 1805.296
ARIMA(0,1,2)(1,1,0)[12] : 1805.702
ARIMA(0,1,2)(1,1,1)[12] : 1805.295
ARIMA(0,1,2)(1,1,2)[12] : Inf
ARIMA(0,1,2)(2,1,0)[12] : 1807.026
ARIMA(0,1,2)(2,1,1)[12] : 1807.441
ARIMA(0,1,3)(0,1,0)[12] : 1817.787
ARIMA(0,1,3)(0,1,1)[12] : Inf
ARIMA(0,1,3)(0,1,2)[12] : Inf
ARIMA(0,1,3)(1,1,0)[12] : Inf
ARIMA(0,1,3)(1,1,1)[12] : Inf
ARIMA(0,1,3)(2,1,0)[12] : Inf
ARIMA(0,1,4)(0,1,0)[12] : 1820.052
ARIMA(0,1,4)(0,1,1)[12] : Inf
ARIMA(0,1,4)(1,1,0)[12] : Inf
ARIMA(0,1,5)(0,1,0)[12] : Inf
ARIMA(1,1,0)(0,1,0)[12] : 1825.579
ARIMA(1,1,0)(0,1,1)[12] : 1812.512
ARIMA(1,1,0)(0,1,2)[12] : 1814.657
ARIMA(1,1,0)(1,1,0)[12] : 1813.2
ARIMA(1,1,0)(1,1,1)[12] : 1814.614
ARIMA(1,1,0)(1,1,2)[12] : 1816.192
ARIMA(1,1,0)(2,1,0)[12] : 1815.227
ARIMA(1,1,0)(2,1,1)[12] : Inf
ARIMA(1,1,0)(2,1,2)[12] : 1817.796
ARIMA(1,1,1)(0,1,0)[12] : 1816.841
ARIMA(1,1,1)(0,1,1)[12] : 1802.853
ARIMA(1,1,1)(0,1,2)[12] : 1805.117
ARIMA(1,1,1)(1,1,0)[12] : 1805.653
ARIMA(1,1,1)(1,1,1)[12] : Inf
ARIMA(1,1,1)(1,1,2)[12] : Inf
ARIMA(1,1,1)(2,1,0)[12] : Inf
ARIMA(1,1,1)(2,1,1)[12] : Inf
ARIMA(1,1,2)(0,1,0)[12] : 1819.234
ARIMA(1,1,2)(0,1,1)[12] : 1805.22
ARIMA(1,1,2)(0,1,2)[12] : 1807.539
ARIMA(1,1,2)(1,1,0)[12] : 1807.381
ARIMA(1,1,2)(1,1,1)[12] : 1807.538
ARIMA(1,1,2)(2,1,0)[12] : 1808.925
ARIMA(1,1,3)(0,1,0)[12] : 1820.05
ARIMA(1,1,3)(0,1,1)[12] : 1806.055
ARIMA(1,1,3)(1,1,0)[12] : 1808.732
ARIMA(1,1,4)(0,1,0)[12] : Inf
ARIMA(2,1,0)(0,1,0)[12] : 1824.435
ARIMA(2,1,0)(0,1,1)[12] : 1811.07
ARIMA(2,1,0)(0,1,2)[12] : 1813.287
ARIMA(2,1,0)(1,1,0)[12] : 1811.619
ARIMA(2,1,0)(1,1,1)[12] : 1813.247
ARIMA(2,1,0)(1,1,2)[12] : Inf
ARIMA(2,1,0)(2,1,0)[12] : 1813.821
ARIMA(2,1,0)(2,1,1)[12] : 1815.872
ARIMA(2,1,1)(0,1,0)[12] : Inf
ARIMA(2,1,1)(0,1,1)[12] : Inf
ARIMA(2,1,1)(0,1,2)[12] : Inf
ARIMA(2,1,1)(1,1,0)[12] : Inf
ARIMA(2,1,1)(1,1,1)[12] : Inf
ARIMA(2,1,1)(2,1,0)[12] : Inf
ARIMA(2,1,2)(0,1,0)[12] : Inf
ARIMA(2,1,2)(0,1,1)[12] : Inf
ARIMA(2,1,2)(1,1,0)[12] : Inf
ARIMA(2,1,3)(0,1,0)[12] : Inf
ARIMA(3,1,0)(0,1,0)[12] : 1823.646
ARIMA(3,1,0)(0,1,1)[12] : 1808.49
ARIMA(3,1,0)(0,1,2)[12] : 1810.542
ARIMA(3,1,0)(1,1,0)[12] : 1808.594
ARIMA(3,1,0)(1,1,1)[12] : 1810.321
ARIMA(3,1,0)(2,1,0)[12] : 1810.708
ARIMA(3,1,1)(0,1,0)[12] : Inf
ARIMA(3,1,1)(0,1,1)[12] : Inf
ARIMA(3,1,1)(1,1,0)[12] : Inf
ARIMA(3,1,2)(0,1,0)[12] : Inf
ARIMA(4,1,0)(0,1,0)[12] : 1823.996
ARIMA(4,1,0)(0,1,1)[12] : 1810.199
ARIMA(4,1,0)(1,1,0)[12] : 1810.845
ARIMA(4,1,1)(0,1,0)[12] : Inf
ARIMA(5,1,0)(0,1,0)[12] : 1825.055
Best model: ARIMA(0,1,1)(0,1,1)[12]
print(summary(fit_arima))
Series: data_ts_train
ARIMA(0,1,1)(0,1,1)[12]
Coefficients:
ma1 sma1
-0.7675 -0.5465
s.e. 0.0977 0.1295
sigma^2 = 138827719: log likelihood = -897.43
AIC=1800.85 AICc=1801.16 BIC=1808.11
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 805.2378 10822.93 7747.841 0.2158443 2.213481 0.3015941 0.03453594
checkresiduals(fit_arima)
Ljung-Box test
data: Residuals from ARIMA(0,1,1)(0,1,1)[12]
Q* = 10.898, df = 17, p-value = 0.8618
Model df: 2. Total lags used: 19
fcst_arima <- forecast(fit_arima, h=8)
plot(fcst_arima)
lines(data_ts_test, col='red')
df_arima = as.data.frame(fcst_arima)
predict_value_arima = df_arima$`Point Forecast`
MAPE(predict_value_arima, data_ts_test)*100
[1] 2.814135